Picture for Bing Zhao

Bing Zhao

BilliardPhys-Bench: Benchmarking Physical Reasoning and Visual Dynamics of Multimodal LLMs

Add code
May 29, 2026
Viaarxiv icon

CrystalXRD-Bench: Benchmarking Vision-Language Models for XRD Peak Indexing Across Diverse Crystalline Materials

Add code
May 28, 2026
Viaarxiv icon

Qwen-Image-Bench: From Generation to Creation in Text-to-Image Evaluation

Add code
May 27, 2026
Viaarxiv icon

Qwen-Image-2.0 Technical Report

Add code
May 11, 2026
Viaarxiv icon

Long-CODE: Isolating Pure Long-Context as an Orthogonal Dimension in Video Evaluation

Add code
Apr 19, 2026
Viaarxiv icon

FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning

Add code
Apr 04, 2026
Viaarxiv icon

IndustryCode: A Benchmark for Industry Code Generation

Add code
Apr 03, 2026
Viaarxiv icon

MolQuest: A Benchmark for Agentic Evaluation of Abductive Reasoning in Chemical Structure Elucidation

Add code
Mar 26, 2026
Viaarxiv icon

Logics-Parsing-Omni Technical Report

Add code
Mar 12, 2026
Viaarxiv icon

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration

Add code
Mar 04, 2026
Viaarxiv icon